EN FR
EN FR


Section: New Results

Syntax modelling and treebank development

Participants : Djamé Seddah, Héctor Martínez Alonso, Benoît Sagot, Elias Benaissa, Wigdan Abbas Mekki Medeni, Émilia Verzeni.

In 2017, ALMAnaCH members have contributed to the Universal Dependency initiative [44]:

  • Héctor Martínez Alonso has resumed his contribution to the Universal Dependencies (UD) initiative, with annotations and data evaluations for Catalan, Danish and Spanish datasets.

  • Several ALMAnaCH members have worked on converting the French TreeBank into the UD model and format (paper to be presented in 2018) and on the automatic identification of syntactic structures in UD.

As part of the ANR Parsiti project (2016-2020), whose goal is to build the next generation of context-enhanced NLP tools, we are currently developing a parallel data set of user-generated content language pairs, French-English and North-African dialect Arabic-French. Each of those pairs contains highly non-canonical text, heavily contextualized. We built the translation pairs and are currently carrying out annotations at the morpho-syntactic level. None of these data set already exist, they will be first used for the evaluation of our current processing chains and then to bootstrap state-of-the-art models as part of their training data. 3 annotators are involved over a year long period (18 man.month, end in June 2018).